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Abstract 

In video based face recognition, face images are typically 
captured over multiple frames in uncontrolled conditions, 
where head pose, illumination, shadowing, motion blur and 
focus change over the sequence. Additionally, inaccuracies 
in face localisation can also introduce scale and alignment 
variations. Using all face images, including images of poor 
quality, can actually degrade face recognition performance. 
While one solution it to use only the 'best' subset of images, 
current face selection techniques are incapable of simulta- 
neously handling all of the abovementioned issues. We pro- 
pose an efficient patch-based face image quality assessment 
algorithm which quantifies the similarity of a face image 
to a probabilistic face model, representing an 'ideal' face. 
Image characteristics that affect recognition are taken into 
account, including variations in geometric alignment (shift, 
rotation and scale), sharpness, head pose and cast shad- 
ows. Experiments on FERET and PIE datasets show that 
the proposed algorithm is able to identify images which are 
simultaneously the most frontal, aligned, sharp and well 
illuminated. Further experiments on a new video surveil- 
lance dataset (termed ChokePoint) show that the proposed 
method provides better face subsets than existing face se- 
lection techniques, leading to significant improvements in 
recognition accuracy. 



1. Introduction 

Video-based identity inference in surveillance conditions 
is challenging due to a variety of factors, including the 
subjects' motion, the uncontrolled nature of the subjects, 
variable lighting, and poor quality CCTV video recordings. 
This results in issues for face recognition such as low reso- 
lution, blurry images (due to motion or loss of focus), large 
pose variations, and low contrast [14, 35, 40]. While recent 
face recognition algorithms can handle faces with moder- 
ately challenging illumination conditions [15, 17, 24, 28], 
strong illumination variations (causing cast shadows and 
self- shadowing) remain problematic [30]. 
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One approach to overcome the impact of poor quality 
images is to assume that such images are outliers in a se- 
quence. This includes approaches like exemplar extraction 
using clustering techniques (eg. k-means clustering [13]) 
and statistical model approaches for outlier removal [6]. 
However, these approaches are not likely to work when 
most of the images in the sequence have poor quality — the 
good quality images would actually be classified as outliers. 

Another approach is explicit subset selection, where a 
face quality assessment is automatically made on each im- 
age, either to remove poor quality face images, or to se- 
lect a subset comprised of high quality images [10, 21, 32]. 
This improves recognition performance, with the additional 
benefit of reducing the overall computation load during fea- 
ture extraction and matching [19]. The challenge in this 
approach is finding a good definition for "face quality". 

Several face image standards have been proposed for 
face quality assessment (eg. ISO/IEC 19794-5 [1] and 
ICAO 9303 [2]). In these standards, quality can be divided 
into: (i) image specific qualities such as sharpness, contrast, 
compression artifacts, and (u)face specific qualities such as 
face geometry, pose, eye detectability, illumination angles. 

Based in part on the above standards, many approaches 
have been proposed to analyse various face and image 
properties. For example, face pose estimation using tree 
structured multiple pose estimators [38], and face align- 
ment estimation using template matching [7]. Asymme- 
try analysis has been proposed to simultaneously estimate 
two qualities: out-of-plane rotation and non-frontal illumi- 
nation [10, 29, 39]. 

Since face recognition performance is simultaneously 
impacted by multiple factors, being able to detect one or two 
qualities is insufficient for robust subset selection. One ap- 
proach to simultaneously detect multiple quality character- 
istics is through a fusion of individual face and image qual- 
ity measurements. Nasrollahi and Moeslund [21] proposed 
a weighted quality fusion approach to combine out-of-plan 
rotation, sharpness, brightness, and image resolution quali- 
ties. Rua et al. [26] proposed a similar quality assessment 
approach, by using asymmetry analysis and two sharpness 
measurements. Hsu et al. [16] proposed to learn fusion pa- 
rameters on multiple quality scores to achieve maximum 
correlation with matching scores between face pairs. An- 



other proposed fusion approach uses a Bayesian network to 
model the relationships among qualities, image features and 
matching scores [22] . The main drawbacks of the above fu- 
sion approaches are: 

• Fusion-based approaches only perform as well as their 
individual classifiers. For example, if a pose estima- 
tion algorithm requires accurate facial feature localisa- 
tion, the whole fusion framework will fail in the cases 
where that pose algorithm fails (such as in low resolu- 
tion CCTV footage) [34] . 

• As various properties are measured individually and 
have different influence on face quality, it may be dif- 
ficult to combine them to output a single quality score 
for the purposes of image selection. 

• As multiple classifiers as involved, they are typically 
more time consuming and hence may not be suitable 
for real-time surveillance applications. 

• Since face matching scores are heavily dependant on 
system-specific details (including the input features, 
matching algorithms and training images), quality as- 
sessment approaches that learn a fusion model based 
on match scores end up being closely tied to the par- 
ticular system configuration and hence need to be re- 
trained for each system. 

Simultaneously detecting multiple quality characteristics 
can also be accomplished by learning a generic model to 
define the 'ideal' quality. Luo [18] proposed a learning 
based approach where the quality model is trained to corre- 
late with manually labelled quality scores. However, given 
the subjective nature of human labelling, and the fact that 
humans may not know what characteristics work best for 
automatic face recognition algorithms, this approach may 
not generate the best quality model for face recognition. 

In this paper we propose a straightforward and effective 
patch-based face quality assessment algorithm, targeted to- 
wards handling images obtained in surveillance conditions. 
It quantifies the similarity of a given face to a probabilistic 
face model, representing an 'ideal' face, via patch-based lo- 
cal analysis. Without resorting to fusion, the proposed algo- 
rithm outputs a single score for each image, with the score 
simultaneously reflecting the degree of alignment errors, 
pose variations, shadowing, and image sharpness (under- 
lying resolution). Localisation of facial features (ie. eyes, 
nose, mouth) is not required. 

We continue the paper as follows. In Section 2 we de- 
scribe the proposed quality assessment algorithm. Still im- 
age and video datasets used in the experiments are briefly 
described in in Section 3. Extensive performance com- 
parisons against existing techniques are given in Section 4 
(on still images) and Section 5 (on surveillance videos). 
The main findings are discussed in Section 6. 



2. Probabilistic Face Quality Assessment 

The proposed algorithm is comprised of five steps: 
(1) pixel-based image normalisation, (2) patch extraction 
and normalisation, (3) feature extraction from each patch, 
(4) local probability calculation, (5) overall quality score 
generation via integration of local probabilities. These steps 
are elaborated below: 

1. For a given image /, we perform non-linear pre- 
processing (log transform) to reduce the dynamic 
range of data. Following [9], the normalised image 
/log is calculated using: 



/iog(r,c)=ln[/(r,c) + l] 



(1) 



where I{r,c) is the pixel intensity located at {r,c). 
Logarithm normalisation amplifies low intensity pixels 
and compresses high intensity pixels. This property is 
helpful in reducing the intensity differences between 
skin tones. 

2. The transformed image /|og is divided into N overlap- 
ping blocks (patches). Each block 6^ has a size of n x n 
pixels and overlap neighbouring blocks by t pixels. To 
accommodate for contrast variations between face im- 
ages, each patch is normalised to have zero mean and 
unit variance [36]. 

3. From each block, a 2D Discrete Cosine Transform 
(DCT) feature vector is extracted [11]. Excluding the 
0-th DCT component (as it has no information due to 
the previous normalisation), the top d low frequency 
components are retained. The low frequency compo- 
nents retain generic facial textures [12], while largely 
omitting person- specific information. At the same 
time, cast shadows [36] as well as variations in pose 
and alignment can alter the local textures. 

4. For each block location i, the probability of the corre- 
sponding feature vector cc^ is calculated using a loca- 
tion specific probabilistic model: 



p{Xi\^li,T,^) = 



exp [-| {X^ - flif T.- ^ {X^ - mJ] 



(2) 



(27r)2 |X;.|2 

where fn and Xli are the mean and covariance matrix 
of a normal distribution. The model for each location is 
trained using a pool of frontal faces with frontal illumi- 
nation and neutral expression. All of the training face 
images are first scaled and aligned to a fixed size, with 
each eye located at a fixed location. We emphasise that 
during testing, the faces do not need to be aligned. 

5. By assuming that the model for each location is in- 
dependent, an overall probabilistic quality score Q for 
image /, comprised of N blocks, is calculated using: 



Q(/) = V. \ogp{xi\fj,-,T.i) 



(3) 



The resulting quality score represents the probabilistic 
similarity of a given face to an "ideal" face (as represented 
by a set of training images). A higher quality score reflects 
better image quality. 

3. Face Datasets 

In this section, we briefly describe the FERET, PIE and 
ChokePoint face datasets, as well as their setup for our ex- 
periments. 

FERET [23] and PIE [31] are used to analyse how accu- 
rate the proposed quality assessment algorithm is for cor- 
rectly selecting best quality images with several desired 
characteristics, compared to other existing methods. In to- 
tal, there are 1124 unique subjects in the training phase and 
1263 subjects in the test phase. 

The ChokePoint dataset contains surveillance videos. It 
is used to study the improvement in verification perfor- 
mance gained from subset selection, using the proposed 
quality method as well as other approaches. 

3.1. Setup of Still Image Datasets: FERET and PIE 

To study the performance of the proposed method in 
terms of correctly selecting images with desired characteris- 
tics, we simulated blurring as well as four alignment errors 
using images from the 'fb' subset of FERET. Experiments 
with pose variations (out-of-plane rotation) used dedicated 
subsets from FERET and PIE. Experiments with cast shad- 
ows used the illumination subset of PIE. 

The generated alignment errors^ are: horizontal shift and 
vertical shift (using displacements of 0, ±2, ±4, ±6, ±8 pix- 
els), in-plane rotation (using rotations ofO°, ±10°, ±20°, 
±30°), and scale variations (using scaling factors of 0.7, 0.8, 
0.9, 1.0, 1.1, 1.2, 1.3). For sharpness variations, each original 
image is first downscaled to three sizes (48 x 48, 32 x 32 and 
16 X 16 pixels) then rescaled to the baseline size of 64 x 64 
pixels. See Fig. 1 for examples. 

FERET provides the dedicated 'b' subset with pose vari- 
ations, containing out-of plane rotations of 0°, ±15°, ±25°, 
±40°, ±60°. PIE also provides a dedicated subset with 
pose variations, though with a smaller set of rotations (0°, 
±22.5°, ±45°, ±67.5°). 

The illumination subset of PIE was used to assess per- 
formance in various cast shadow conditions. In our experi- 
ments, we divided the frontal view images into six subsets^ 
based on the angle of the corresponding light source. Sub- 
set 1 has the most frontal light sources, while subset 6 has 
the largest light sources angle (54°- 67°). See Fig. 2 for ex- 
amples. 




Aligned Horizontal Vertical In-Plane Scale Blurring 

Shift Shift Rotation Change 

Figure 1. Examples of simulated image variations on FERET. 




^The generated alignment errors are representatives of real-life charac- 
teristics of automatic face localisation/detection algorithms [25] . 

2 Subset 1: light source 8, 11, 20; Subset 2: light source 6, 7, 9, 12, 
19, 21; Subset 3: light source 5, 10, 13, 14; Subset 4: light source 18, 22; 
Subset 5: light source 4, 15; Subset 6: light source 2, 3, 16, 17. 



Subset 1 Subset 2 Subset 3 Subset 4 Subset 5 Subset 6 
(0°) (16°- 21°) (31°- 32°) (37°- 38°) (44°- 47°) (54°- 67°) 

Figure 2. Examples from PIE with strong directed illumination, 
causing self- shadowing. 

3.2. Surveillance Videos: ChokePoint Dataset 

We collected a video dataset^, termed ChokePoint, de- 
signed for experiments in person identification/verification 
under real-world surveillance conditions using existing 
technologies. An array of three cameras was placed above 
several portals (natural choke points in terms of pedestrian 
traffic) to capture subjects walking through each portal in a 
natural way (see Figs. 3 and 4). 

While a person is walking through a portal, a sequence 
of face images (ie. a face set) can be captured. Faces in such 
sets will have variations in terms of illumination conditions, 
pose, sharpness, as well as misalignment due to automatic 
face localisation/detection [25]. Due to the three camera 
configuration, one of the cameras is likely to capture a face 
set where a subset of the faces is near-frontal. 

The dataset consists of 25 subjects (19 male and 6 fe- 
male) in portal 1 and 29 subjects (23 male and 6 female) 
in portal 2. In total, it consists of 48 video sequences and 
64,204 face images. Each sequence was named according 
to the recording conditions (eg. P2E_S1_C3) where P, S, 
and C stand for portal, sequence and camera, respectively. 
E and L indicate subjects either entering or leaving the por- 
tal. The numbers indicate the respective portal, sequence 
and camera label. For example, P2L_S1_C3 indicates that 
the recording was done in Portal 2, with people leaving the 
portal, and captured by camera 3 in the first recorded se- 
quence. 

In this paper, all the experiments were performed with 
the video-to-video verification protocol. In this protocol, 
video sequences are divided into two groups {Gl and G2), 
where each group played the role of development set and 
evaluation set in turn. Parameters can be first learned on the 
development set and then applied on the evaluation set. The 
average verification rate is used for reporting results. In our 
experiments we selected the frontal view cameras (shown in 
Table 1). In each group, each sequence takes turn to be the 
gallery, with the the leftover sequences becoming the probe. 



http : //itee .uq. edu . au/ ~conrad/chokepoint .html 
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Figure 3. An example of the recording setup used for the Choke- 
Point dataset. A camera rig contains 3 cameras placed just above a 
door, used for simultaneously recording the entry of a person from 
3 viewpoints. The variations between viewpoints allow for varia- 
tions in walking directions, facilitating the capture of a near-frontal 
face by one of the cameras. 




Figure 4. Example shots from the ChokePoint dataset, showing 
portals with various backgrounds. 



Table 1. ChokePoint video-to-video verification protocol. Se- 
quences are divided into two groups (Gl and G2). Listed se- 
quences contain faces with the most frontal pose view. P, S, and C 
stand for portal, sequence and camera, respectively. E and L indi- 
cate subjects entering or leaving the portal. The numbers indicate 
the respective portal, sequence and camera label. For example, 
P2L_S1_C3 indicates that the recording was done in Portal 2, with 
people leaving the portal, and captured by camera 3 in the first 
recorded sequence. 



Gl 



G2 



P1E_S1_C1 
P1L_S1_C1 



P1E_S2_C2 
P1L_S2_C2 



P2E_S2_C2 
P2L_S2_C2 



P2E_S1_C3 
P2L_S1_C1 



P1E_S3_C3 
P1L_S3_C3 



P1E_S4_C1 
P1L_S4_C1 



P2E_S4_C2 
P2L_S4_C2 



P2E_S3_C1 
P2L_S3_C3 



4. Experiments on Still Images 

In this section, we evaluate how well the proposed qual- 
ity assessment method can identify the best quality faces 
when presented with both good and poor quality faces. 
The proposed method was compared with: (i) a score fu- 
sion method using pixel based asymmetry analysis and 
two sharpness analyses (denoted as Asymshrp) [26], 
(ii) asymmetry analysis with Gabor features (denoted as 
Gabor_asym) [29], (iii) the classical Distance From Face 
Space (DFFS) method [5]. 

The 'fa' subset of FERET, containing frontal faces with 
frontal illumination and neutral expression, was used to 
train the location specific probabilistic models in the pro- 
posed method. The 'fa' subset was also used to select the 
decision threshold for rejecting "poor" quality images. The 
'fa' subset was not used for any other purposes. 

Based on preliminary experiments, closely cropped face 
images were scaled to 64 x 64 pixels, the block size was 
set to 8 X 8 pixels, with a 7 pixels overlap of neighbouring 
blocks. The preliminary experiments also suggested that 
using just 3 DCT coefficients was sufficient. This configu- 
ration was used in all experiments. The quality assessment 
methods were implemented with the aid of the Armadillo 
C++ library [27]. 

4.1. Quality Assessment of Faces with Variations in 
Alignment, Scale and Sharpness 

In this experiment we evaluated the efficacy of each 
method to detect the best aligned images within a set of 
images that have a particular image variation. For exam- 
ple, out of the set of faces with rotations of 0°, ±10°, ±20°, 
±30°, we measured the percentage of 0° faces that were la- 
belled as "high" quality. 

Results for variations in shift, rotation and scale, shown 
in Table 2, indicate that the proposed method consistently 
achieved the best or near-best performance across most of 
the variations. The results on the six PIE illumination sub- 
sets indicate that even in the presence of cast shadows, the 
proposed method can achieve good results, with the excep- 
tion of images with scale changes. Averaging over all vari- 
ations, the proposed method achieved the best results. 

The asymmetry-based analysis methods (Gabor_asym 
and Asym_sharp) could not reliably detect vertical align- 
ment errors and scale variations. Gabor _asym also per- 
formed poorly for detecting images with various sharpness 
variations. Asym.shrp addressed this by combining asym- 
metry analysis with two image sharpness measurements. 
Despite that, the overall performance of Asym_shrp was still 
poor. 

The performance of DFFS on alignment errors was con- 
sistent but generally lower than the proposed method. No- 
tably, DFFS failed to detect images with the best sharpness. 



Table 2. Quality assessment of alignment errors and sharpness variations on FERET 'fb' and all six PIE illumination subsets. Each value 
in the table indicates the percentage of the best aligned image in each variation type being assigned to have the highest quality score. For 
example, out of the set of faces with rotations of 0°, ±10°, ±20°, ±30°, the value indicates the percentage of 0° faces labelled as "high" 
quality. The variations included: horizontal shift (HS), vertical shift (VS), in-plane rotation (RT), scale (SC), sharpness (SH). The 'overall' 
columns indicate the average performance of the above variations. Best performance is highlighted in bold. 





FERET 'fb' 


PIE illumination 




HS 


VS 


RT 


SC 


SH 


overall 


HS 


VS 


RT 


SC 


SH 


overall 


Asym_shrp [26] 


44.4 


7.7 


79.8 


7.4 


100.0 


47.9 


10.3 


4.0 


40.4 


2.4 


100.0 


31.4 


Gabor_asym [29] 


52.1 


3.1 


93.9 


11.5 


49.0 


41.9 


24.7 


1.5 


66.4 


10.7 


29.0 


26.5 


DFFS [5] 


75.6 


71.9 


98.7 


62.5 


0.7 


61.9 


64.4 


62.4 


99.6 


44.4 


2.3 


54.6 


Proposed 


83.4 


85.4 


99.6 


73.0 


99.8 


88.2 


65.9 


62.6 


98.8 


37.0 


95.9 


72.0 



Table 3. Quality assessment of pose variations on the pose subsets of FERET and PIE. Each value in the table indicates the percentage of 
images with a particular pose angle that were assigned to have the highest quality score. Best performance is highlighted in bold. 





FERET pose subset 




-60° 


-40° 


-25° 


-15° 


0° 


±15° 


±25° 


±40° 


±60° 


Asym_shrp [26] 








0.5 


30.5 


68.0 


1 











Gabor_asym [29] 


2 


5.5 


7.5 


24.5 


58.0 


2.5 











DFFS [5] 











5 


92.0 


3 











Proposed 








0.5 


28 


68.5 


3 














PIE pose subset 




-67.5° 


-45° 


-22.5° 


— 


0° 


— 


±22.5° 


±45° 


±67.5° 


Asym_shrp [26] 








2.94 


— 


94.1 


— 


1.5 


1.5 





Gabor_asym [29] 





8.8 


10.3 


— 


73.5 


— 


5.9 


1.5 





DFFS [5] 





1.5 


11.8 


— 


79.4 


— 


7.4 








Proposed 








4.4 


— 


91.2 


— 


4.4 









4.2. Quality Assessment on Pose Variations 

In this experiment we evaluated the ability of each 
method to detect the most frontal faces in a set that in- 
cluded frontal and non-frontal (out-of-plane rotated) faces. 
The results, shown in Table 3, indicate that the proposed 
method consistently achieves second best performance on 
both FERET and PIE, with its performance on PIE being 
quite close to the top performer (Asym_shrp). 

We note that on FERET the visual differences between 
faces at 0° and ±15° are minimal, which can explain why 
a significant proportion of faces at -15° was classified as 
"frontal" by the proposed method. 

While DFFS gave the best performance on FERET, its 
performance dropped on PIE. As there is an overlap be- 
tween the subjects in the 'fa' and pose subsets in FERET 
(where 'fa' was used for training), the inconsistency in per- 
formance across FERET and PIE suggests that DFFS might 
be over trained to the training dataset. 



The performance of Asym_shrp and the proposed 
method is considerably better on PIE than on FERET. We 
conjecture that this is due to the larger pose variation be- 
tween frontal faces and faces with the smallest pose angle 
(±22.5°), in contrast to ±15° on FERET. 



Table 4. Quality assessment of images with cast shadows from 
the PIE dataset. Each value in the table indicates the percentage of 
images with a particular illumination direction that were assigned 
to have the highest quality score. The illumination ranged from 
frontal (subset 1) to strongly directed (subset 6) where there are 
strong shadows (see Fig. 2). 





PIE illumination subset 




1 


2 


3 


4 


5 


6 


Asym_shrp [26] 


97.1 


2.9 














Gabor_asym [29] 


51.5 


5.9 


2.9 


39.7 


4.4 





DFFS [5] 








4.4 


88.2 


7.4 





Proposed 


94.1 


5.9 















4.3. Quality Assessment on Cast Shadow Variations 

Here we evaluated the accuracy of selecting frontal face 
images with the least amount amount of cast shadow within 
a set of images subject to varying illumination direction. 
The direction ranged from frontal (subset 1) to side (sub- 
set 6), where severe cast shadows exist (as shown in Fig. 2). 

The results, presented in Table 4, show that Asym_shrp 
achieved the best performance (correctly labelling frontally 
illuminated faces as having high quality), with the proposed 
method a close second. In contrast, Gabor.asym was con- 
fused between subsets 1 and 4, while DFFS erroneously la- 
belled most faces in subset 4 (containing significant shad- 
ows) as having the highest quality. 

5. Experiments on Video: Subset Selection 

In this section, we study the effectiveness of using qual- 
ity measurements to select a subset of images for video- 
based face verification. To demonstrate the effectiveness 
of the quality assessment for a variety of face recognition 
systems, we used two facial feature extraction algorithms 
and two classification techniques, specifically designed for 
dealing with sets of faces (ie. image set matching). 

Specifically, we separately used Multi-Region His- 
tograms (MRH) [28] and Local Binary Patterns (LBP) [4] 
to extract features from each face. The comparison be- 
tween two sets of faces was performed using (1) Mutual 
Subspace Method (MSM) [37] (for both MRH and LBP), 
and (11) feature averaging [8, 20] (for MRH only). 

The experiments were conducted on the ChokePoint 
dataset, using the video-to-video protocol (see Sec. 3.2). 
Each set of face images for a particular person was rank 
ordered according to the quality scores of the images, fol- 
lowed by keeping the top N images. 

As per Section 4, the proposed face quality measure- 
ment method was compared against three other methods: 
Asym_shrp, Gabor_asym and DFFS. The 'fa' subset of 
FERET, which is totally independent from ChokePoint, was 
used for training DFFS and the proposed quality measure- 
ment method. 

In the first experiment, N varied from 4 to 16. The re- 
sults, reported in Table 5, indicate that the proposed quality 
measurement method consistently leads to better face veri- 
fication performance than the other three methods, regard- 
less of the facial feature extraction algorithm used. The im- 
provement is most prevalent for A^ = 4, indicating that the 
proposed method assigns high scores to high quality images 
more accurately. 

In the second experiment, N varied from 1 to the size of 
the set (labelled as "all"). Each face set was represented by 
an average MRH signature; LBP feature extraction was not 
used as it isn't suitable for feature averaging. Face sets were 
then compared by using an Li-norm based distance between 
their corresponding average MRH signatures [8, 20, 28]. 



Table 5. Video-based face verification performance on the Choke- 
Point dataset, using MRH and LBP feature extraction algorithms 
coupled with the Mutual Subspace Method (MSM) for classify- 
ing face sets. Each set of face images for a particular person was 
rank ordered according to the quality scores of the images, fol- 
lowed by retaining top N quality images (ie. N is the subset size). 
Faces were segmented using automatic face localisation (detec- 
tion). The average face verification rate is reported (see Sec. 3.2). 
Best performance is highlighted in bold. 





Recognition Method 


Subset Selection 
Method 


MRH + MSM 


LBP + MSM 


N-4 


N=8 


N=16 


N=4 


N=8 


N=16 


Asym_shrp [26] 


67.5 


70.3 


75.4 


65.3 


67.6 


70.5 


Gabor_asym [29] 


75.4 


78.6 


84.0 


69.3 


71.4 


74.5 


DFFS [5] 


74.7 


78.1 


83.4 


69.4 


70.3 


74.6 


Proposed 


82.5 


84.5 


86.7 


73.5 


74.7 


75.8 
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Figure 5. Video-based face verification performance on the 
ChokePoint dataset using average MRH signatures. Each set of 
face images for a particular person was rank ordered according to 
the quality scores of the images, followed by selecting a predefined 
number of top quality images to create a subset. Faces were seg- 
mented using automatic face localisation (detection). The average 
face verification rate is reported (see Sec. 3.2). 



From the results shown in Fig. 5, it can be observed that 
using all captured faces generally does not lead to the best 
performance. It can also be observed that the proposed 
method considerably outperforms the other three methods 
for A^ < 32, and furthermore leads to the best verification 
performance (which occurs at A^ = 16). We note that even 
when only one face is selected by the proposed method 
(ie. N = 1), relatively high verification accuracy is still 
achieved. This suggests that the proposed method has a high 
chance of picking the "best" face out of a set of faces. 



6. Main Findings 
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In this paper we presented a novel patch-based face im- 
age quaUty assessment algorithm. Unlike previous meth- 
ods, the proposed approach is capable of simultaneously 
handling issues such as pose variations, cast shadows, blur- 
riness as well as alignment errors caused by automatic face 
localisation (eg. in-plane rotations, horizontal and vertical 
shifts). 

The proposed method was evaluated on two still face 
datasets (FERET and PIE), using faces subject to pose and 
illumination direction changes, as well as simulated geo- 
metric alignment errors and decreased sharpness. Exper- 
iments show that the proposed method has the best overall 
performance, identifying images which are the most frontal, 
well-aligned, illuminated and sharp. This is accomplished 
without requiring parameter tuning or retraining for each 
dataset tested. 

The proposed method was also evaluated in a video- 
based face verification setting, on a new surveillance dataset 
termed ChokePoint. For each given set of face images for 
a person, the proposed method was used to rank the images 
according to their quality. By selecting a subset containing 
only the top quality images, verification accuracy was con- 
siderably improved when compared to using all available 
images. Furthermore, the proposed method consistently led 
to higher quality subsets (leading to higher verification ac- 
curacy) than previous image quality assessment algorithms. 

The proposed method is capable of assigning low-quality 
scores to images with cast shadows (eg. due to self- 
shadowing caused by strong directed illumination), how- 
ever it is currently unlikely to detect more subtle variations 
in illumination. This is due to its elaborate illumination 
normalisation steps, necessary for generalisation purposes 
(ie. not being tied to the level of contrast and/or illumi- 
nation bias in a particular training dataset). The proposed 
method is also unlikely to detect minor expression varia- 
tions, as only low frequency information is used. According 
to [3, 33], expression changes mainly lie in high frequency 
bands. However, many of the recent face recognition algo- 
rithms are capable of handling relatively minor variations in 
both illumination and expression [15, 17, 24, 28], thus these 
characteristics of the quality assessment method might be 
more of a feature than a limitation. 
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